Discovering Frequent Itemsets in the Presence of Highly Frequent Items

نویسندگان

  • Dennis P. Groth
  • Edward L. Robertson
چکیده

This paper presents new techniques for focusing the discovery of frequent itemsets within large, dense datasets containing highly frequent items. The existence of highly frequent items adds signi cantly to the cost of computing the complete set of frequent itemsets. Our approach allows for the exclusion of such items during the candidate generation phase of the Apriori algorithm. Afterwards, the highly frequent items can be reintroduced, via an inferencing framework, providing for a capability to generate frequent itemsets without counting their frequency. We demonstrate the use of these new techniques within the well-studied framework of the Apriori algorithm. Furthermore, we provide empirical results using our techniques on both synthetic and real datasets both relevant since the real datasets exhibit statistical characteristics di erent from the probabilistic assumptions behind the synthetic data. The source we used for real data was the U.S. Census.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MINING FUZZY TEMPORAL ITEMSETS WITHIN VARIOUS TIME INTERVALS IN QUANTITATIVE DATASETS

This research aims at proposing a new method for discovering frequent temporal itemsets in continuous subsets of a dataset with quantitative transactions. It is important to note that although these temporal itemsets may have relatively high textit{support} or occurrence within particular time intervals, they do not necessarily get similar textit{support} across the whole dataset, which makes i...

متن کامل

A Survey of Frequent and Infrequent Weighted Itemset Mining Approaches

Itemset mining is a data mining method extensively used for learning important correlations among data. Initially itemsets mining was made on discovering frequent itemsets. Frequent weighted item set characterizes data in which items may weight differently through frequent correlations in data’s. But, in some situations, for instance certain cost functions need to be minimized for determining r...

متن کامل

An Algorithm for Mining Maximum Frequent Itemsets Using Data-sets Condensing and Intersection Pruning

Discovering maximal frequent itemset is a key issue in data mining; the Apriori-like algorithms use candidate itemsets generating/testing method, but this approach is highly time-consuming. To look for an algorithm that can avoid the generating of vast volume of candidate itemsets, nor the generating of frequent pattern tree, DCIP algorithm uses data-set condensing and intersection pruning to f...

متن کامل

Extracting Repitative Patterns from Fuzzy Temporal Data

Association rules mining from temporal dataset is to find associations between items that hold within certain time frame but not throughout the dataset. This problem involves first discovering frequent itemsets which are frequent at certain time intervals and then extracting association rules from such frequent itemsets. In practice, we may have datasets having imprecise or fuzzy time attribute...

متن کامل

An Efficient Algorithm to Automated Discovery of Interesting Positive and Negative Association Rules

Association Rule mining is very efficient technique for finding strong relation between correlated data. The correlation of data gives meaning full extraction process. For the discovering frequent items and the mining of positive rules, a variety of algorithms are used such as Apriori algorithm and tree based algorithm. But these algorithms do not consider negation occurrence of the attribute i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001